P a g e 9 | 15
The Bottom-Up Pathway
The bottom-up pathway is feedforward computation of backbone ConvNet. It is known as one
pyramid level is for each stage. The output of last layer of each step will be used as the reference set
of feature maps for enriching the top-down pathway by lateral connection.
Top-Down Pathway and Lateral Connection
The higher resolution features are upsampled spatially coarser, but semantically stronger,
feature maps from higher pyramid levels. More particularly, the spatial resolution
is upsampled by a factor of 2 using nearest neighbor for simplicity.
Each lateral connection adds feature maps of the same spatial size from the bottom-up
pathway and top-down pathway.
Specifically, the feature maps from the bottom-up pathway undergo 1×1
convolutions to reduce channel dimensions.
And feature maps from the bottom-up pathway and top-down pathway are merged
by element-wise addition.
Prediction in FPN
Finally, the 3×3 convolution is appended on each merged map to generate a final feature
map, which is to reduce the aliasing effect of upsampling. This last set of feature maps is
called {P2, P3, P4, P5}, corresponding to {C2, C3, C4, C5} that are respectively of same
spatial sizes.
Because all levels of pyramid use shared classifiers/regressors as in a traditional featured
image pyramid, feature dimension at output d is fixed with d = 256. Thus, all extra
convolutional layers have 256 channel outputs.
Q7. DeepID-Net( Def-Pooling Layer)
Answer:
A new def-pooling (deformable constrained pooling) layer is used to model the deformation of the
object parts with geometric constraints and penalties. That means, except detecting the whole object
directly, it is also important to identify object parts, which can then assist in detecting the whole
object.